Keyword Weight Propagation for Indexing Structured Web Content
نویسندگان
چکیده
When documents are atomically structured, it is possible to assign them keyword vectors to support indexing. Most web content, however, have non-atomic structures. These include navigational/semantic hierarchies on the web. Although they are especially effective for browsing, such structures make it hard for individual nodes to be properly indexed. This is because, in many cases, their contents have to be inferred from the contents of their neighbors, ancestors, and descendants in the structure. In this paper, we propose a novel keyword and keyword weight propagation technique to properly enrich the data nodes in structured content. In particular, our approach first relies on understanding the context provided by the relative content relationships between entries in the structure. We then leverage this information for relative-content preserving keyword propagation. Experiments show that we observe a significant improvement (10−15%) in precision with the proposed keyword propagation algorithm.
منابع مشابه
Content-Aware DataGuides for Indexing Large Collections of XML Documents
XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this e...
متن کاملChemDig: new approaches to chemically significant indexing and searching of distributed web collectionsy
We describe an extension of the ht:==Dig robot-based internet indexing and search engine to include the retrieval of information included in a variety of molecular data formats as defined by chemical MIME types. This is achieved by invoking chemical meta-parsers, software agents designed to provide key meta-data information about the content of the external chemical files. This meta-data can in...
متن کاملEffective Searching in Structured Data
Keyword search is the mechanism of choice for information discovery and retrieval due to the enormous success of Internet search engines. In fact, nearly half of Internet users perform at least one search daily. The keyword search paradigm regrettably does not extend to similar forms of content, particularly semistructured and relational data. Searching structured content is difficult because s...
متن کاملAnswering Structured Queries on Unstructured Data
There is growing number of applications that require access to both structured and unstructured data. Such collections of data have been referred to as dataspaces, and Dataspace Support Platforms (DSSPs) were proposed to offer several services over dataspaces, including search and query, source discovery and categorization, indexing and some forms of recovery. One of the key services of a DSSP ...
متن کاملA Template-Based Approach to Keyword Search over Semantic Data
Keyword search is receiving a lot of attention not only in Web contexts but also in the database area. It is an easy way to allow inexperienced user to query systems without the need of knowing any specific language or how data is structured. As a matter of fact, the amount of data available, in the Web as well as in other systems, is constantly increasing. And, with the improvements and the si...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006